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Abstract 


Gene body methylation (gbM) is an epigenetic mark where gene exons are methylated in the CG context only, as opposed to 
CHG and CHH contexts (where H stands for A, C, or T). CG methylation is transmitted transgenerationally in plants, opening 
the possibility that go>M may be shaped by adaptation. This presupposes, however, that goM has a function that affects 
phenotype, which has been a topic of debate in the literature. Here, we review our current knowledge of gbM in plants. 
We start by presenting the well-elucidated mechanisms of plant gobM establishment and maintenance. We then review 
more controversial topics: the evolution of goM and the potential selective pressures that act on it. Finally, we discuss the 
potential functions of goM that may affect organismal phenotypes: gene expression stabilization and upregulation, inhibition 
of aberrant transcription (reverse and internal), prevention of aberrant intron retention, and protection against TE insertions. 
To bolster the review of these topics, we include novel analyses to assess the effect of goM on transcripts. Overall, a growing 
body of literature Ttinds that goM correlates with levels and patterns of gene expression. It is not clear, however, if this is a 
causal relationship. Altogether, functional work suggests that the effects of gbM, if any, must be relatively small, but there 
is nonetheless evidence that it is shaped by natural selection. We conclude by discussing the potential adaptive character of 
gbM and its implications for an updated view of the mechanisms of adaptation in plants. 
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Significance 


Gene body methylation (hereafter goM) is acommon phenomenon in plants and can affect up to 60% of genes in some 
species. It has been controversial whether gobM has any function in plants, but recent findings suggest It is under selec- 
tion and correlated with fitness. Here, we review the scientific literature, include novel analyses, and discuss the potential 
role of gbM in rapid evolutionary change. 


© The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. 
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https:/creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, 
distribution, and reproduction in any medium, provided the original work is properly cited. 
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Introduction 


Epigenetics is the study of changes in gene expression that 
can be inherited through cell divisions (either mitotic or 
meiotic) that are not due to modifications in the DNA se- 
quence (Holliday 1994; Cavalli and Heard 2019). A long- 
Standing question is whether epigenetics can play a role 
in adaptation (Charlesworth et al. 2017; Cavalli and 
Heard 2019; Boquete et al. 2021). Cavalli and Heard 
(2019) stated that “a direct demonstration that other mo- 
lecules, in addition to DNA [sequence], carry substantial 
heritable information would represent an important con- 
ceptual change in evolutionary biology.” Theoretically, epi- 
genetic marks may be the basis for this conceptual change. 
If epigenetic marks affect fitness, if they are inherited 
through generations, and if they epimutate over time, 
then they can be the target of selection and facilitate 
adaptation. 

Cytosine methylation is one common epigenetic mark 
that is generally found in eukaryotes, including vertebrates, 
insects, fungi, and plants (Zemach and Zilberman 2010; 
Schmitz et al. 2019). In some of these groups, cytosines 
are methylated only in a single context when they are part 
of a CG dinucleotide. In plants, however, cytosine methyla- 
tion occurs in three sequence contexts—CG, CHG, and CHH 
(where H stands for A, T, or C). Methylation marks in these 
three contexts are produced by different biochemical path- 
ways and have different patterns of inheritance. For ex- 
ample, epimutation accumulation lines in Arabidopsis 
thaliana have demonstrated that genome-wide methylation 
divergence at CG dinucleotides increases throughout >30 
generations (van der Graaf et al. 2015), illustrating that 
plant CG DNA methylation is transmitted from generation 
to generation and epimutates over time (Yao et al. 2021). 
In contrast, CHH methylation is mostly erased by demethy- 
lation in the A. thaliana male germline and later reset during 
embryonic development (Calarco et al. 2012). Therefore, 
CHH methylation is only transmitted partially over, at 
most, one or a few generations (with the interesting excep- 
tion of some asexual plants without meiosis; Boquete et al. 
2021). The transgenerational inheritance of the third con- 
text—CHG methylation—remains unclear. Although CHG 
methylation is retained during gametogenesis (Calarco 
et al. 2012), epimutation accumulation lines in A. thaliana 
do not diverge for CHG methylation over generations (van 
der Graaf et al. 2015), suggesting that CHG methylation is 
not inherited at a genome-wide scale. It is possible, how- 
ever, that some genomic sites inherit CHG methylation 
over a few generations, especially in some asexual species 
(Boquete et al. 2021). To summarize, of the three methyla- 
tion contexts in plants, methylation in CG dinucleotides is 
most prone to transgenerational inheritance and Is there- 
tore the best candidate for epigenetic adaptation. 
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To consider the possibility of epigenetic adaptation, it Is 
also important to know where these marks reside in the 
genome. In flowering plants, patterns of DNA methylation 
vary among genomic regions. Methylation in all three con- 
texts silences transposable elements (TEs) and prevents ac- 
tivity at regulatory elements (Luo et al. 2018; Schmitz et al. 
2019). Both CHG and CHH genic methylation types are as- 
sociated with reduced expression levels, as is CG methyla- 
tion In promoter regions (Zhang et al. 2006; Niederhuth 
et al. 2016). In contrast, the exons of some genes (~20% 
of A. thaliana genes; Takuno and Gaut 2012) are methy- 
lated only in the CG context, a phenomenon called gene 
body methylation (hereafter gbM). gbM is mostly found 
in moderately and constitutively expressed housekeeping 
genes (Zhang et al. 2006; Neri et al. 2017; Schmitz et al. 
2019). However, since its initial discovery, the topic of 
gbM function has been controversial (Zhang et al. 2006; 
Teixeira and Colot 2009; Bewick et al. 2017; Zilberman 
2017). If it has no function, it is obviously unlikely to con- 
tribute to adaptive processes. 

Here, we review our current knowledge of gbM in 
plants, with the ultimate goal to critically evaluate whether 
it has a function and may be a target for natural selection. 
We start by presenting the mechanisms of plant gbM estab- 
lishment and maintenance because these mechanisms are 
crucial tor understanding how selection could act on this 
epigenetic state. We then consider the evolution of gbM, 
specifically whether gbM is a neutral manifestation of epi- 
genomic dynamics or whether there is evidence that it 
can be advantageous. Adaptive arguments presume that 
goM has a phenotype on which natural selection can act. 
Some, but certainly not all, recent work has established a 
connection between gbM and gene expression, but ques- 
tions about generality and mechanisms remain. To address 
these questions, we review Tunctional analyses of mutants 
and also comparative epigenomic approaches that have 
Studied hypothetical functions of goM, namely, its potential 
role in regulating and stabilizing expression, preventing ab- 
errant transcription, and improving the fidelity of intron 
splicing. Finally, we present a model synthesizing the preva- 
lence, distribution, and effect of goM with Its potential evo- 
lutionary significance. 


GbM Establishment and Maintenance 
Mechanisms 


CG methylation is maintained during plant cell division by 
Methyltransferase 1 (MET1), which adds a methyl group 
on the symmetrical CG dinucleotide of a complementary 
DNA strand (Kawashima and Berger 2014). Epimutation ac- 
cumulation lines in A. thaliana have shown that the main- 
tenance of CG methylation by MET1 is an inherently 
error-prone process, with the epimutation rate estimated 
to be ~10-° per generation per haploid epigenome for 
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the loss of CG methylation in genes (van der Graaf et al. 
2015). Without methylation maintenance mechanisms, 
CG methylation is quickly diluted and lost over cell divisions, 
as demonstrated by the absence of CG methylation in A. 
thaliana met? mutants (Cokus et al. 2008). 

Studies on Eutrema salsugineum, a close relative of A. 
thaliana, have recently clarified the mechanisms respon- 
sible for the establishment of gbM in plants (Bewick et al. 
2016). Eutrema salsugineum lacks both goM and the 
Chromomethylase 3 (CMT3) gene. The link between 
CMT3 loss and the absence of goM was at first enigmatic 
because, until recently, CMT3 was not known to methylate 
CG sites. It is now known that the CMT3 protein is involved 
in a self-reinforcing teedback loop: CMT3 recognizes the 
histone mark H3K9mez2 (histone H3 lysine 9 dimethylation) 
and then de novo methylates nearby cytosines predomin- 
antly in the CHG context but also occasionally in the CG 
context. CHG DNA methylation in turn leads to H3K9 
methylation by SU(VAR) Homologue 4 (SUVHA4), leading 
to a positive feedback loop between CHG methylation 
and H3K9 methylation Johnson et al. 2007). CHG methy- 
lation typically suppresses transcription; however, in A. 
thaliana, CHG methylation is removed in transcribed genes 
due to active demethylation of H3K9 by /ncreased in Bonsai 
Methylation 1 (IBM1) (Saze et al. 2008; Miura et al. 2009). 

The joint loss of CWT3 and gbM evolved independently 
in two Brassicaceae species, corroborating their associ- 
ation. However, cmt3 mutants in A. thaliana have shown 
that CMT3 does not affect the maintenance of goM once 
it is established (Stroud et al. 2013), suggesting the action 
of CMT3 is limited to gobM establishment (Bewick et al. 
2016; Niederhuth et al. 2016). Interestingly, transgenic re- 
insertion of CMT3 Into E. salsugineum re-established genic 
methylation in all three contexts in a subset of genes 
(Wendte et al. 2019). This subset of genes has been called 
“CHG-gain” genes, and these genes tend to be ortholo- 
gous to gbM genes in A. thaliana (Wendte et al. 2019). 
After the CMT3 transgene was lost, CHG-gain genes only 
maintained methylation in the CG context, presumably 
due to maintenance by MET1 (Wendte et al. 2019). On 
average, CHG-gain genes are longer, contain more exons, 
and exhibit a moderate—but on average higher—level of 
expression than non-CHG-gain genes (Wendte et al. 
2019). CHG-gain genes are also enriched for CWG trinu- 
cleotides (CAG and CTG) as opposed to CCG trinucleo- 
tides, consistent with the preferred substrate of CMT3 
(Goull and Baulcombe 2016; Stoddard et al. 2019). 
Finally, CHG-gain genes have a higher frequency of CHG 
cytosines compared to non-CHG-gain genes (Wendte 
et al. 2019). 

The role of CMT3 in genic de novo methylation was re- 
cently confirmed in A. thaliana mutants that hyperexpress 
CMT3 during late embryonic development (Papareddy 
et al. 2021). CMT3 hyperexpression induces embryonic 
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hypermethylation predominantly in the CWG context, but 
hypermethylation is also found in other contexts, including 
CG dinucleotides. These findings confirm that CMT3 Is 
Sloppy and can methylate contexts other than CHG. 
Methylation changes caused by embryonic CMT3 hyperex- 
pression were maintained over cell divisions and still ob- 
served in 3-week-old plants, consistent with the model 
that CMT3-induced epimutations give rise to gobM that 
can be maintained by MET1 across cell divisions and gen- 
erations. The same gene patterns were repeatedly observed 
in independent transgenic lines, confirming that CMT3 hy- 
permethylation is not stochastic and tends to target a spe- 
cific gene set (Papareddy et al. 2021). CMT3-induced 
hypermethylation was also enriched in genes characterized 
by inaccessible chromatin marks and heterochromatin his- 
tone variants (Papareddy et al. 2021). Altogether, these ob- 
servations lead to a model in which gbM establishment is 
caused by the recruitment of CMT3, the formation of a 
teedback loop that ultimately produces CHG, CHH, and 
CG methylation, the eventual removal of CHG and CHH 
methylation, and the maintenance of the remaining CG 
methylation by MET1 (fig. 1). 


gbM Gene Characteristics and Evolution 


gbM genes are typically defined statistically as being signifi- 
cantly more methylated than the genic average in the CG 
context and significantly less methylated than the genic 
average In the CHG and CHH contexts (Takuno and Gaut 
2012). Once defined, the proportion of goM genes varies 
greatly across species, with as many as ~60% of genes in 
Mimulus guttatus but 0% in Marchantia polymorpha, 
Physcomitrella patens, and E. salsugineum (Niederhuth 
et al. 2016; Takuno et al. 2016; Niederhuth and Schmitz 
2017). The lack of goM in a few species has been used to 
argue that gbM is dispensable and thus has no function 
(Bewick et al. 2016, 2019). However, the loss of gbM ina 
Tew species does not imply that it is nonfunctional in all 
plants (Zilberman 2017). 

A remarkable feature of goM Is that it is enriched over a 
conserved set of orthologs among species as distantly re- 
lated as ferns and angiosperms (Takuno and Gaut 2013; 
Seymour et al. 2014; Niederhuth et al. 2016; Takuno 
et al. 2016; Seymour and Gaut 2020). Two alternative hy- 
potheses can explain the remarkable conservation of 
gbM. The first is the biased establishment of gbM in a sub- 
set of specific genes with inaccessible chromatin marks and 
heterochromatic (H3K9me2) histone variants (Wendte 
et al. 2019). If these biases are conserved across species, 
they could explain the distribution of goM across both 
genes and species. This first hypothesis is neutral with re- 
spect to selection because it does not assume that gobM 
has any effect on fitness. Instead, in this scenario, goM Is 
a consequence of CMT3 activity that Is retained and 
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Fic. 1.—The establishment and maintenance of gbM in plants. The 
DNA Is represented as a line coiled around nucleosomes. Red dots indicate 
methylated H3K9 tails. CG, CHG, and CHH DNA methylation are drawn as 
black, gray, and white lollipops, respectively. (a and b) CMT3 induces de 
novo methylation at CHG sites of genes associated with inaccessible chro- 
matin marks and heterochromatin histone variants (Papareddy et al. 2021). 
(b and c) CHG-H3K9mez2 self-reinforcing feedback loop is then established. 
(cand d) CMT3 preferentially de novo methylates CWG sites but to a lesser 
extent also methylates other contexts, such as CG. (d and e) Demethylation 
of H3K9 by IBM1 is coupled to gene transcription. (f) After a few cell divi- 
sions, only CG methylation (mCG) remains due to MET1 maintenance. 





transmitted over generations by MET1 (Teixeira and Colot 
2009; Wendte et al. 2019; Papareddy et al. 2021). 

An observation in favor of the neutral hypothesis is that 
gobM genes share many characteristics of the CHG-gain 
genes described previously. That is, the genes targeted by 
CMT3 are like goM genes, in that they are generally charac- 
terized as being constitutively expressed at moderate levels 
and tend to be longer than unmethylated genes, with more 
exons and a higher trequency of CAG and CTG (as opposed 
to CCG) trinucleotides (Zhang et al. 2006; Lister et al. 2008; 
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Takuno and Gaut 2012, 2013; Bewick et al. 2016, 2017; 
Niederhuth et al. 2016; Takuno et al. 2016). Moreover, 
CHG-gain genes in E. salsugineum tend to be orthologous 
to goM genes in A. thaliana (Wendte et al. 2019). This ob- 
servation suggests gbM establishment is biased toward spe- 
cific genes, potentially explaining the conservation of gbM 
between orthologs (Wendte et al. 2019; Papareddy et al. 
2021). The overlap between CHG-gain and gbM is not com- 
plete because ~40% of CHG-gain genes in E. salsugineum 
are orthologous to a goM gene In A. thaliana (Wendte et al. 
2019). One explanation for the imperfect overlap between 
CHG-gain genes and gbM genes Is that they were defined 
in different species — that is, CHG-gain genes were defined 
in E. salsugineum and gbM genes in A. thaliana. Moreover, 
these two species diverged 47 Ma (Arias et al. 2014), which 
may be ample time for the targets of CMT3 to diverge. 
Finally, another plausible explanation is temporal. The 
CHG-gain genes in F. sa/lsugineum were established experi- 
mentally over only a few generations; the continuation of 
this experiment over a much longer timeframe could lead 
to the establishment of methylation within more genes, po- 
tentially increasing the 40% overlap. 

An alternative hypothesis is that the conservation of goM 
across genes and species is shaped in part by the action of 
natural selection (Zilberman 2017). Under this scenario, 
specific subsets of genes are targeted for de novo establish- 
ment of gbM, but selection on or against goM removes or 
maintains CG methylation in different gene sets. At least 
three observations support the hypothesis that some gobM 
is under selection. First, DNA methylation is mutagenic 
and elevates C to T substitutions (Bird 1980). Therefore, 
the conservation of gbM In a specific set of orthologous 
genes is surprising, especially because goM genes are gen- 
erally enriched tor housekeeping and other important func- 
tions and evolve more slowly than unmethylated genes 
(Takuno and Gaut 2012, 2013; Takuno et al. 2017: 
Seymour and Gaut 2020). This suggests the possibility 
that the mutagenic nature of methylation is compensated 
by an advantageous effect that maintains goM in specific 
genes (Ziloerman 2017). The second observation in tavor 
of the selective hypothesis comes from the comparison of 
the gbM status In orthologous genes of eight grass species, 
where shifts in the goM status of genes are almost exclusive 
to the tips of the phylogeny (i.e., In a single species) 
(Seymour and Gaut 2020). This pattern suggests that shifts 
in goM are deleterious and generally not favored over evo- 
lutionary time (table 1); however, it is also possible that the 
pattern is driven by epimutational biases. 

The third observation is based on population genetic 
analyses because selection acting on gbM can be explicitly 
measured using DNA methylation variation among natural 
populations of a species. Indeed, if goM is advantageous 
within a given gene, an unmethylated allele will be disad- 
vantageous and removed by selection. In such a gene, 
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only a small proportion of individuals should be observed 
with an unmethylated allele. To infer the intensity of selec- 
tion, Charlesworth and Jain (2014) constructed a popula- 
tion model that relies on the site frequency spectrum 
(SFS) of epigenetic states. In an inspired analysis, Vidalis 
et al. (2016) applied this model to the SFS of CG sites within 
all genes of a sample of 92 A. thaliana individuals. They did 
not detect a deviation from neutrality (table 1), but this re- 
sult came with two important caveats. The Tirst is that the 
test Is unlikely to be powerful with a small sample, particu- 
larly if goM has a small impact on fitness. Vidalis et al. 
(2016) used the sample of 92 individuals that was available 
at the time, but larger samples now exist. The second is that 
CG methylation within genes is not limited to godM genes 
but can also be found in genes that are methylated in all 
three contexts (i.e., TE-like methylation; Kawakatsu et al. 
2016). Methylation tn all three contexts within a gene can 
be caused by a nearby TE insertion, is known to suppress ex- 
pression, and may be an indication of pseudogenization. 
Therefore, most genes with TE-like methylation are likely 
to be under different evolutionary pressures than gobM 
genes, such that analyzing both goM and TE-like methy- 
lated genes together, as done by Vidalis et al. (2016), is like- 
ly to confound opposing selection pressures. 

For all these reasons, we recently repeated the analyses 
of Vidalis et al. (2016) with the important difference that 
we separated gobM gene sets from any genes with TE-like 
methylation (Muyle et al. 2021). We also relied on larger 
data sets—that is, two distinct subsets of 876 and 120 in- 
dividuals that originated from different sources—from the 
1001 methylomes project in A. thaliana (Kawakatsu et al. 
2016). To assess whether selection acts on the gbM state, 
we characterized the population frequency of methylation 
at the gene level to estimate the SFS of gene allelic states. 
Using the population genetic model of Charlesworth and 
Jain (2014), we inferred that genes with ancestral goM in 
Brassicaceae were under significant selection to remain 
CG methylated in A. thaliana (table 1; Muyle et al. 2021) 
based on the larger data set. Conversely, ancestrally un- 
methylated genes in Brassicaceae were under selection to 
remain unmethylated in A. thaliana. We repeated the ana- 
lyses on the smaller data set and also on an SFS drawn at the 
level of individual cytosines. The former had similar trends 
as the larger data set but without a significant effect of 
goM, and the latter corroborated our gene-level analyses. 
That is, the overall impression is that CG sites within ances- 
trally goM genes in Brassicaceae have been under selection 
to remain methylated in A. thaliana, while CG sites within 
ancestrally unmethylated genes have been under selection 
to be unmethylated in A. thaliana (Muyle et al. 2021). 
Importantly, the results were also confirmed after splitting 
the gene sets into CHG-gain and non-CHG-gain genes, as 
characterized by Wendte et al. (2019), showing that biases 
in epimutation rates between gene sets were not 
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completely responsible for the inferred selection acting on 
gbM. In other words, this control using CHG-gain genes 
shows that cis effects (either genetic or epigenetic) that lo- 
cally influence epimutation rates do not explain the interred 
selective pressures. 

Like all evolutionary analyses, there are caveats to this 
analysis, too. First, it relles on a model that simplifies the 
evolutionary process and includes assumptions that do 
not strictly fit the study organism (e.g., the model assumes 
random mating, but A. thaliana is selt-fertilizing). Second, 
there is always the possibility that results are driven by sam- 
pling effects, including demographic history, although 
using two data sets and separate partitions of those data 
sets somewhat discounts that notion here. Third, and per- 
haps most importantly, it is difficult to disentangle genetic 
trom epigenetic effects. Overall, however, this work sug- 
gests that goM has a measurable effect on fitness. The es- 
timated selection coefficients were small (4N.5 = 1.4) but 
nonetheless similar to the magnitude of selection acting 
on codon usage that has been measured in A. thaliana, 
A. lyrata, and Capsella rubella (Qiu et al. 2011). 

One interesting feature of codon bias, a phenomenon 
widely accepted as a genomic feature that is under weak 
selection, is that it varies among species, with selection de- 
tectable in species with large historical population sizes but 
not detectable in small Ne species (Galtier et al. 2018). An 
overarching feature of gbM Is that much of the experimen- 
tal and comparative work on gbM has focused on A. thali- 
ana. It is worth noting that this species may be atypical in at 
least three respects. First, two independent studies relying 
on different data sets and approaches have inferred that 
A. thaliana has lost goM three times faster than gaining It 
(Takuno et al. 2017; Muyle et al. 2021) relative to closely re- 
lated outcrossing species. Second, the recent shift of A. 
thaliana to an inbreeding mating system reduced Its effect- 
ive population size (Mattila et al. 2020), which is likely to 
have weakened the efficacy of selection on gbM in that 
species. Finally, methylation mutants usually have little 
phenotypic effect in A. thaliana, whereas they are often le- 
thal in taxa with higher TE load, such as maize (Li et al. 
2014). Together these observations suggest that A. thall- 
ana may not be the best model for measuring the evolu- 
tionary effects of methylation, and yet there is still some 
evidence that selection acts on gbM in that species, raising 
the possibility that the effects of goM may be more pro- 
nounced in other species. For this reason, we advocate 
that similar analyses are extended to other taxa when large 
methylation data sets become available. 


Does gbM Affect Gene Expression? 


Given that there are some Indications that goM may be un- 
der weak selection, one naturally wonders what Its function 
might be. One consistent hypothesis has been that goM 
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Note.—Many studies contradict one another, suggesting that if goM indeed has a function, its effect must be subtle. 
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affects gene expression levels. This hypothesis first came 
trom the observation that genic methylation levels across 
genes within A. thaliana are associated with expression le- 
vels: methylated genes tend to be intermediately to highly 
expressed, with lower expression variance among tissues 
(Zhang et al. 2006; Zilberman et al. 2007; Takuno and 
Gaut 2012). These patterns have been interpreted in two 
ways: either gobM might affect expression patterns (fig. 
2a) or, conversely, active transcription might drive gbM 
(Teixeira and Colot 2009). Many highly expressed genes 
do not have gbM in A. thaliana (Zhang et al. 2006; 
Zilberman et al. 2007), an observation that discounts the 
second hypothesis or at least suggests that the relationship 
is not completely straightforward. Moreover, It is now 
known that CMT3 does not depend on gene expression 
to methylate genes but instead on inaccessible chromatin 
marks and heterochromatin histone variants (Wendte 
et al. 2019; Papareddy et al. 2021), although it remains pos- 
sible that the initial recruitment of CMT3 requires or de- 
pends on gene expression. 

One difficulty in assessing the effect of goM on gene ex- 
pression comes from the possible confusion between gen- 
etic and epigenetic effects. Indeed, variation in gene 
expression can be caused by numerous tactors—for ex- 
ample, by nearby single nucleotide polymorphisms (SNPs) 
in regulatory sequences, by a nearby TE insertion (genetic 
cis effects), by a change In a transcription factor (trans et- 
fects), or by a change in the gene DNA methylation level 
(epigenetic cis effects). Genome-wide association studies 
(GWASs) and epigenome-wide association studies have 
shown that DNA methylation variants associated with ex- 
pression variation are often in linkage disequilibrium with 
nearby SNPs (Kawakatsu et al. 2016), making it difficult 
to disentangle the respective contribution of SNPs and 
methylation variation on gene expression. However, 
Meng et al. (2016) found a significant association between 
cis-methylation and gene expression in hundreds of genes 
across 135 A. thaliana accessions. Interestingly, goM was 
positively correlated with gene expression, and the effect 
remained significant after controlling for SNPs. Overall, 
the number and magnitude of the affected loci by DNA 
methylation were smaller than the effect of SNPs, and 
hence, the authors concluded that DNA methylation has 
limited effects on expression variation (table 1; Meng 
et al. 2016). 

The association between gbM and expression was fur- 
ther tested experimentally in epigenetic recombinant in- 
bred lines (epiRILs) obtained through the cross of a met? 
mutant and wild-type (WT) A. thaliana, followed by eight 
generations of inbreeding (Reinders et al. 2009). The result- 
ing epiRILs have a mosaic methylome, with some regions 
derived from the met7 mutant that originally lacked gobM 
and other regions containing CG methylation derived 
trom the WT parent. Bewick et al. (2016) inferred 
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differentially expressed genes between the met?-derived 
regions of epiRILs and their WT homologs. They found 
only 6 out of 3,471 genes that were gbM in WT plants 
and became differentially expressed when located in met7- 
derived regions in epiRiLs. On the other hand, they found 
significantly more genes (46 out of 3,124, P-value=2.55 
x 107?) that were unmethylated in WT plants and became 
differentially expressed when located in met?-derived re- 
gions in epiRILs. Taken together, these results suggest 
that gbM loss has little, if any, effect on gene expression 
(table 1). However, if goM has a small effect on expression, 
it is likely to have been missed by differential expression 
analyses that typically detect individual genes with twofold 
or more expression differences. More subtle effects may be 
Statistically detectable only by approaches that summarize 
trends across multiple genes. More importantly, the met? 
mutant might be a poor system to study the association be- 
tween gene methylation and expression level because both 
methylated and unmethylated genes were upregulated in 
met? mutants using microarray data (Ziloerman et al. 
2007). 

Another approach to test for associations between gbM 
and expression has been to use comparative and evolutionary 
rather than experimental approaches. For example, several 
Studies have compared expression between gbM-deprived 
E. salsugineum with its close relative A. thaliana, but the re- 
sults have been controversial. In the first study, Bewick 
et al. (2016) estimated the expression of unmethylated 
F. salsugineum genes that are orthologous to gbM genes 
in A. thaliana and found no difference in expression be- 
tween species (table 1). Muyle and Gaut (2019) reanalyzed 
the data from Bewick et al. (2016) and used genes that 
were unmethylated in both A. thaliana and E. salsugineum 
as a negative control to measure the average difference in 
expression between the two species. When taking into ac- 
count this species effect in a linear model, gobM loss in E. sal- 
sugineum was associated with a small but significant 
decrease in expression (table 1). In the third study using 
the same data, Bewick et al. (2019) disagreed on the use 
of unmethylated genes as a negative control because 
they have been shown to have more variable expression le- 
vels over evolutionary time and again found no effect of 
gbM loss on gene expression. 

Another effort compared gobM and unmethylated genes 
between A. thaliana and A. lyrata (Takuno et al. 2017). 
Methylated genes were expressed at significantly higher le- 
vels, on average, and with less variation between species 
than non-CG-methylated genes. The authors identified 
genes that changed methylation status between A. thaliana 
and Arabidopsis lyrata to examine whether the shift in 
methylation correlated with gene expression. They found 
that genes that had gained gbM in one of the two species 
also tended to shift toward higher expression levels, but 
these results were not statistically significant (table 1). 
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However, genes that differed in gbM status between A. 
thaliana and A. lyrata exhibited significantly higher variance 
in expression between species than genes that were gbM in 
both species (Takuno et al. 2017), consistent with previous 
Studies suggesting gobM modulates expression variability 
(Zilberman 2017). Another comparative study compared 
the methylomes of eight grass species and found that 
genes that were gbM in all eight species tended to have 
higher and less variable expression compared to genes 
that varied in their methylation state across species 
(Seymour and Gaut 2020). Although the effect was very 
small, the results suggested a positive effect of goM on ex- 
pression level and expression stabilization (fig. 2c). It Is 
worth emphasizing, however, that this approach, like 
most comparative approaches, cannot determine causality. 
More recently, we used the 876 A. thaliana methylomes to 
study the association between gobM and gene expression 
within a species by comparing the methylation state of al- 
leles both to their expression level and to the variability in 
expression across the larger data subset of 876 A. thaliana 
methylomes (Muyle et al. 2021). Across genes with poly- 
morphic methylation states, the expression of gbM alleles 
was consistently and significantly higher than unmethy- 
lated alleles (table 1). Taken across the entire genome, 
gbM alleles also had a significantly less variable expression 
level compared to unmethylated alleles of the same gene 
(Muyle et al. 2021). Although consistent across the thou- 
sands of genes in the data set, the effect was quite small: 
on average, a methylated allele had ~1 more RNA-seq 
read than an unmethylated allele. A weakness of this 
work Is that it did not disentangle potential genetic effects 
trom epigenetic effects; however, the gbM effect did re- 
main consistent when models included a proxy for genetic 
variation, by including the number of CG dinucleotides in 
Statistical analyses. Consistent with our A. thaliana results, 
work on the outcrossing crucifer Capsella grandiflora has 
revealed that the presence of gbM is a major predictor of 
cis-regulatory constraint (Steige et al. 2017). GoM lowers 
the probability of allele-specific expression via cis- 
regulation, again suggesting a stabilizing effect of gobM 
on expression level. 

As we have just reviewed, several studies have estab- 
lished an association between gbM and stable expression 
level (Tig. 26 and c), which complement the proposal that 
gobM has a homeostatic effect on expression (Zilberman 
2017). This phenomenon has been further investigated by 
Horvath et al. (2019), who studied gene expression levels 
in A. thaliana roots via single-cell RNA-seg. They found no 
Significant correlation between gbM and gene expression 
noise (as measured by the variation in the expression level 
among single cells). However, goM was significantly posi- 
tively correlated with gene expression consistency, which 
they measured as the number of single-cell RNA-segq repli- 
cates in which the gene was expressed (Tig. 26). This effect 
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Fic. 2.—Potential goM functions and evolutionary consequences. Unmethylated genes, represented on the left column, are compared to gobM genes on 
the right. TSS stands for the transcription start site, TTS for the transcription termination site, and TE for the transposable element. (a) goM is hypothesized to 
upregulate gene expression. The number of mRNA molecules, represented by wav lines, illustrates the gene expression level. (6) godbM may stabilize gene 
expression by triggering consistent expression levels among the cells of a tissue. (c) goM may stabilize gene expression, as seen by the more constant and 
conserved expression levels observed among species. (’) goM could prevent aberrant internal and reverse transcription by silencing alternative promoters 
within genes. goM might also inhibit aberrant TTS. These hypotheses are coherent with the typical depletion of CG methylation observed around the TSS 
and TTS of genes. (e) gobM is hypothesized to facilitate correct splicing and prevent aberrant intron retention. (f) Some TEs preferentially insert into genes; 
however, goM may protect against deleterious insertions within genes. 
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remained after correcting other genomic features such as 
gene expression, gene length, gene conservation, and 
gene duplication status. Therefore, Horvath et al. (2019) 
found that gobM genes are more consistently expressed 
than unmethylated genes across cells of a tissue, which 
can be interpreted as implying that gbM is involved in the 
maintenance of a consistent gene expression (fig. 26). It 
this is true, the mechanism by which this happens remains 
unknown. One hypothesis comes trom the anticorrelation 
observed between genome-wide distributions of the 
histone variant H2A.Z and DNA methylation in A. thaliana 
(Zilberman et al. 2008). H2A.Z Is typically associated 
with transiently expressed response genes, such as 
immune-responsive or environmental stimulus-responsive 
genes (Coleman-Derr and Zilberman 2012). In met? mu- 
tants, the loss of DNA methylation was accompanied by a 
gain in H2A.Z deposition (Zilberman et al. 2008). 
However, in an h2a.z mutant, DNA methylation patterns 
were only minimally affected (Coleman-Derr and 
Zilberman 2012), suggesting that DNA methylation pre- 
vents H2A.Z incorporation but not the converse. Based 
on these observations, it has been proposed that gbM 
serves to stabilize transcription by preventing deposition 
of the histone variant H2A.Z (Coleman-Derr and 
Ziloerman 2012). 

Finally, Shahzad et al. (2021) have used quantitative trait 
loci (QTL) mapping to identity ~1,000 genes for which the 
proportion of methylated CG sites significantly correlates 
with expression level across a sample of over 900 natural 
A. thaliana accessions. The variance in expression explained 
by CG methylation is modest for most genes, but for some 
genes, It reaches levels comparable to the effect of SNPs on 
expression. goM is mostly positively correlated with expres- 
sion; in contrast, TE-like methylation (i.e., in all three 
contexts) Is, as expected, negatively correlated with expres- 
sion. In a clever extension to control the effect of linked 
SNPs, Shahzad et al. (2021) identified SNPs with significant 
effects on expression using GWAS. They then repeated the 
analysis linking CG methylation and expression within 
nested sets of accessions that carry the same GWAS allele. 
For the vast majority of genes, this approach contirmed the 
significant positive correlation between gbM and expres- 
sion level, either because there was no GWAS SNP or 
because at least one nested sample had a significant correl- 
ation. A second control analyzing haplogroups, which cor- 
rects for cis genetic variation, led separately to the same 
conclusion. They then studied gene expression in the 
met? A. thaliana mutant without gbM, and they found as 
expected a reduced expression level in genes with these 
three characteristics: (1) genes with a significant positive 
correlation between CG methylation and expression level, 
(2) genes that were methylated in the WT Col-0O accession 
(which is the accession used for the met? mutant), and 
(3) genes for which the effect of goM was not confounded 
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by linked genetic variants. The authors concluded that goM 
is positively correlated with gene expression in hundreds of 
genes, independently of local genetic variants. 

Although Shahzad et al. (2021) did attempt to account 
tor trans effects as well as cis effects, a shortcoming of 
most of the comparative studies referred to in this section 
is that they do not account for possible trans genetic effects 
on gene expression, which may result in overestimated cis 
epigenetic effects. Altogether, however, evolutionary and 
comparative studies tend to tind small but detectable rela- 
tionships between gbM and either gene expression levels or 
gene expression variation. These results contrast with many 
direct experimental measurements of goM based on A. 
thaliana mutants. If the comparative conclusions are cor- 
rect, they are important because they suggest that goM 
has a phenotype that may be the target of natural selection. 


Potential Effects of gbM on Internal and 
Reverse Transcription 


To date, studies have been inconsistent as to whether goM 
associates with gene expression (table 1). When it does as- 
sociate with expression, it is also difficult to disentangle 
cause from effect. If, however, we assume there Is a real re- 
lationship between gbM and gene expression, there re- 
mains an open question: what is the mechanism(s) by 
which gbM affects expression? One hypothesis is that 
goM improves transcription by regulating alternative pro- 
moters within gene bodies, thereby potentially preventing 
aberrant internal and/or antisense transcription (fig. 2d; 
Tran et al. 2005; Maunakea et al. 2010). This hypothesis 
stems trom the observation that CG methylation is typically 
depleted within active promoter regions (Feng et al. 2010) 
and also that genes with CG-methylated promoters are si- 
lenced (Niederhuth et al. 2016). Aberrant transcription, 
whether in the sense or antisense orientation, is expected 
to be deleterious because It is energetically costly and leads 
to the accumulation of both unnecessary transcripts and 
truncated proteins that can be toxic for the cell. Aberrant 
antisense transcription is expected to disturb gene expres- 
sion because RNA polymerases coming from both direc- 
tions may collide. Moreover, the RNA-directed DNA 
methylation pathway can be activated by the pairing of 
sense and antisense transcripts into double-stranded RNA 
(Tran et al. 2005), which may further prevent gene expres- 
sion. Hence, if goM prevents aberrant reverse transcription, 
it could explain the aforementioned association between 
goM and gene expression. 

However, the results of tests for this effect have been in- 
consistent (table 1). Some of these tests have taken place in 
mammalian systems because they too exhibit CG methyla- 
tion within genes (Yi 2017), even though they mostly do 
not methylate in the CHG and CHH contexts. For example, 
Neri et al. (2017) studied mouse embryonic stem cells in 
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DNA methyltransferase 3b (Dnmt3b) mutants that lack 
gobM and compared them to WT. To quantify internal tran- 
scription, the authors used a ratio of the number of 
RNA-seg reads that map to the third exon divided by the 
number of reads mapping to the first exon (hereafter 
exon3/exon1). This ratio is expected to increase when there 
Is cryptic intragenic initiation of transcription. Neri et al. 
(2017) found that the loss of goM was accompanied by 
higher exon3/exon1 ratios than WT mice, suggesting an In- 
crease of spurious internal transcription between exons 1 
and 3 when gbM is lost (table 1). The results were con- 
firmed by a sequencing technique that allows characteriza- 
tion of the exact position of the 5’ end of mRNAs by 
targeting the mRNA cap (DECAP-seq). However, 
Teissandier and Bourc’his (2017) performed similar analyses 
in Dnmt triple mutants on highly expressed genes, and they 
were unable to corroborate the findings of Neri et al. (2017) 
(table 1). Results from Teissandier and Bourc’his (2017) sug- 
gest that the role of goM in suppressing spurious transcrip- 
tion initiation may be specific to the lack of DNMT3B, but 
only while other DNMTs are still present. 

Similar work has sought evidence for an effect of goM 
on aberrant transcription in plants. For example, Bewick 
et al. (2016) compared met?-derived regions of A. thaliana 
epiRILs with orthologous WT regions. They quantified anti- 
sense transcription and found that gbM loss did not lead to 
an increase in differentially expressed antisense transcripts 
(table 1). However, Choi et al. (2020) detected that the ex- 
pression of antisense transcripts was activated in 938 genes 
infh1, met? double mutants compared to WT. The number 
of upregulated antisense transcripts was comparatively low 
in single mutants when compared with WT (145 and 34 for 
met? and h7, respectively), suggesting redundancy in H1 
and MET1 repression of antisense transcription. This finding 
demonstrates that, at least for some genes, goM may re- 
press antisense transcription in A. thaliana jointly with his- 
tone H1 (table 1). This study also again exemplifies 
redundancy among DNA methylation, histone variants, 
and histone marks. These different epigenetic marks are 
interdependent and play overlapping roles in the cell, com- 
plicating the characterization and inference of potential 
gbM effects. 

More recently, Li et al. (2021) used RNA long reads se- 
quenced by Oxford Nanopore Technology Direct 
Sequencing (ONT DRS) to characterize transcription start 
sites (TSSs) in A. thaliana. They found that the met7—3 mu- 
tant, which lacks CG methylation, has significantly more 
unique TSSs compared to WT, and these unique TSSs oc- 
curred in regions where mutant methylation was lower 
than WT. These results suggest that goM can prevent the 
initiation of aberrant transcription. The transcription ter- 
mination site (TTS) was also affected by DNA methylation 
(Li et al. 2021). Indeed, the met/—3 mutant had a higher 
number of unique TTS than WT, indicating that CG 
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methylation also inhibits aberrant transcription termin- 
ation. Altogether, this work suggests that goM could en- 
Sure proper transcription of genes from start to end (Tig. 
2d). 

Here, we revisited this issue by analyzing lsoseq 
(PacBio RNA long read) data in maize and A. thaliana 
(Supplementary Materials and Methods, Supplementary 
Material online). We included maize because this is (to 
our knowledge) the first such attempt to examine this ques- 
tion in a plant other than A. thaliana. We focus on |soseq 
data because it can represent full-length MRNA, thanks to 
the selection of MRNAs that contain a 3’ poly-A tail and, 
in some cases, a 5’ cap (when sequencing Is done with a 
cap-trap step). The A. thaliana |soseg data set we analyzed 
has a 5’ cap so that most Isoseq reads likely represent full- 
length mRNAs (Supplementary Materials and Methods). 
Some of the A. thaliana data set was publicly available 
(Cartolano et al. 2016), and we also generated new 
lsoseq data for this study; in both cases, the data were gen- 
erated from Col-0O inflorescences. In contrast, the maize 
lsoseg data, which was generated on pooled RNA extracted 
from six tissues at different developmental stages of the 
B73 inbred line (Wang et al. 2016), were not generated 
with a cap-trap step. With both the maize and A. thaliana 
data sets, we considered aberrant internal transcription to 
be reflected in Isoseq reads that begin after the start of 
exon 1 (fig. 2d). For each gene, the proportion of Tull- 
length lsoseq reads with a “conventional” TSS (1.e., that be- 
gin prior to the start of exon 1) was computed and com- 
pared between gbM and unmethylated (UM) genes. goM 
and UM genes were categorized from publicly available 
methylation data (see Supplemental Materials and 
Methods, Supplementary Material online). 

In maize, we found that gbM genes had a significantly 
higher proportion of conventional TSSs (average 0.81) com- 
pared to UM genes (average 0.78, Wilcoxon test P-value = 
5.63 x 10~'' fig. 3a); superficially, this observation com- 
plies with the prediction that goM genes have less aberrant 
transcription. However, goM is known to be associated 
with more highly expressed and longer genes (Zhang 
et al. 2006; Takuno and Gaut 2012), and these covariates 
must be taken into account. Genes with higher expression 
had a significantly higher proportion of conventional TSSs 
(generalized linear model contrast estimate=0.042, 
Z-ratio = 69.15, P-value <2 x 107'®, Supplementary table 
S1, Supplementary Material online), perhaps reflecting 
higher selective pressures to remove aberrant transcription 
tor highly expressed genes. Longer genes had significantly 
tewer conventional TSSs (generalized linear model contrast 
estimate = —0.178, Z-ratio = —130.7, P-value <2 x 10° '° 
Supplementary table S1, Supplementary Material online), 
which could be attributable to the higher probability of a 
long gene harboring an aberrant internal promoter or an 
experimental artifact (1.e., longer genes may have a higher 
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chance of having their MRNA not Tully reversed transcribed 
during sequencing, leading to 5’ truncated transcripts and 
wrongly inferred aberrant TSS). Notably, however, only 
321 of 2059 detected nonconventional TSSs occurred tn in- 
trons. After taking gene length and gene expression Into ac- 
count in a generalized linear model (Supplementary 
Materials and Methods, eq. 6, Supplementary Material on- 
line), UM genes had a significantly higher proportion of con- 
ventional TSSs compared to gobM genes (generalized linear 
model contrast estimate =0.214, Z-ratio= 33.4, P-value 
<2x107'®, Supplementary table $1, Supplementary 
Material online). This result is not consistent with the expect- 
ation that goM prevents aberrant TSS (table 1). 

The A. thaliana results complement the maize results in 
some ways but not others. Since the data were generated 
with a 5’ cap, the overall proportion of conventional TSSs 
in the A. thaliana Isoseq data set was higher than in maize 
(Tig. 3a). goM genes had a lower proportion of convention- 
al TSSs (mean 0.90) compared to UM genes (0.96). 
However, after taking gene length and gene expression 
into account In a generalized linear model, goM genes 
did have significantly more conventional TSS compared to 
UM genes (P-value <2 x 107'®, Supplementary table S2, 
Supplementary Material online). We conclude that the 
lsoseq data do reflect some advantage of gbM in terms of 
avoiding internal transcription start in WT A. thaliana but 
not in maize (table 1). 

We explored these ideas further by turning to a different 
approach that relies on RNA-seq reads in goM mutants. 
Because 5’ cap Isoseq data are not available for methylation 
mutants, we inferred internal transcription starts using the 
RNA-seg coverage ratio of exon3/exon1, following the ap- 
proach of Neri et al. (2017) (see Supplementary Materials 
and Methods, Supplementary Material online). If goM pre- 
vents aberrant internal transcription start, this ratio should 
increase in goM mutants relative to WT. We therefore mea- 
sured exon3/exon1 RNA-seg coverage in WT and goM mu- 
tants. We performed this comparison for two data sets 
based on two different goM mutants. In data set 1, 
RNA-seg data were generated on 13-day-old seedlings for 
three replicates of WT controls that were compared to three 
replicates of met7—3 mutants (Zhang et al. 2017). In data set 
2, RNA-seg data were generated on leaf tissue for three re- 
plicates of WT controls (these differed from the controls 
within data set 1) that were then compared to five replicates 
of met1,sdg7-8 triple mutants (Bewick et al. 2016). In WT 
plants, goM genes had a lower exon3/exon1 coverage ratio 
compared to UM genes (fig. 36), suggesting superficially 
that goM could prevent internal transcription start. This 
was observed consistently for WT plants from data set 1 
(mean exon3/exon1 coverage 0.662 in gbM genes, 1.919 
in UM genes, one-sided Wilcoxon test P-value <2.2 x 
10~'®) and also from data set 2 (mean exon3/exon1 cover- 
age 1.013 In gbM genes, 3.684 in UM genes, one-sided 
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Wilcoxon test P-value=1.3 x 107''). However, this same 
difference between gbM genes (as defined in WT plants) 
and UM genes was also apparent in mutant plants that 
lacked gbM (fig. 36). That is, goM genes had a lower 
exon3/exon1 ratio compared to UM genes in met?—3 mu- 
tants (0.512 vs. 1.876, one-sided Wilcoxon test P-value 
<2.2 x 10~'®) and in met1,sdg7-8 triple mutants (1.232 
vs. 1.566, one-sided Wilcoxon text P-value <2.2 x 107 '®). 
These observations suggest that the fact that goM genes 
have lower exon3/exon1 coverage in RNA-seq data is not 
due to their methylation state alone because the same pat- 
tern is observed in mutants without goM. While one must 
always be careful that mutants can be complex and may re- 
tlect other (unknown) effects, the data again provide little 
support for the notion that goM alone prevents aberrant in- 
ternal transcription initiation in A. thaliana genes (table 1). 
Interestingly, Choi et al. (2020) showed that goM and H1 
play a redundant role in inhibiting aberrant reverse tran- 
scription, and Martin et al. (2021) hypothesized that an- 
other epigenetic mark, CHH islands, may have some 
redundant function with goM. These redundancies could 
explain why we did not detect any change In internal tran- 
scription start in A. thaliana goM mutants because these re- 
dundant epigenetic marks may have been functioning in 
gobM mutants, thus complicating inferences about gbM 
effects. 

Another aspect of aberrant transcription, aside from in- 
ternal transcription initiation, Is reverse transcription. We 
again used the Isoseq data trom A. thaliana to estimate 
the proportion of full-length antisense reads, which corres- 
pond to reverse transcription events (see Supplementary 
Materials and Methods, Supplementary Material online). 
gobM genes had a significantly lower mean proportion of 
antisense reads (average 0.0046) compared to UM genes 
(average 0.016, Wilcoxon test P-value=3.16 x 107 '%). 
This result held after accounting for gene length and gene 
expression in a generalized linear model (Supplementary 
table $3, Supplementary Material online). However, in maize 
Isoseg data, the goM genes had significantly more antisense 
transcription compared to UM genes (Supplementary table 
S4, Supplementary Material online). We conclude that there 
is evidence that goM prevents antisense transcription in A. 
thaliana based on Isoseq data from WT plants (table 1), 
but we have uncovered no such evidence in maize. 
Altogether, however, we believe there are enough compel- 
ling observations—both from animals and trom A. thaliana 
plants (Choi et al. 2020)—to suggest that further dissection 
of this potential function may be worthwhile. 


Assessing the Effect of gbM on Splicing 
Fidelity 


Another hypothesis is that goM improves splicing fidelity 
and prevents aberrant intron retention (fig. 2e), but this 
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Fic. 3—Novel analyses to assess the effect of goM on transcripts. (a) Proportion of full-length Isoseq reads with conventional TSS in goM and UM genes in 
maize and A. thaliana. lsoseg reads that started after the start of exon 1 were considered as nonconventional. (b) RNA-seq read coverage ratio between exon 3 
and exon 1 for goM and UM genes in A. thaliana. Internal transcription starts happening between exon 1 and exon 3 and is expected to increase the ratio of 
exon 3 to exon 1 coverage. (c) RNA-seg read coverage of introns (in RPKM) for goM and UM genes in A. thaliana. Pools of goM genes are drawn in red, and 
those of UM genes are drawn in turquoise. In data set 1, WT controls were compared to met7-3 mutants. In data set 2, other WT controls were compared to 
met1,sdg7-8 triple mutants. The boxplots show the median, the hinges are the first and third quartiles (the 25th and 75th percentiles), and the whiskers 
extend from the hinge to the largest or smallest value no further than 1.5 times the interquartile range (distance between the first and third quartiles). 


raises the question of how splicing fidelity may drive a rela- 
tionship between gbM and gene expression. One possibility 
is that poor splicing in the absence of gbM leads to the re- 
tention of introns (Tig. 2e) that contain premature stop co- 
dons. Aberrant transcripts containing premature stop 
codons are typically sent to the nonsense-mediated 
mRNA decay for destruction (Causier et al. 2017), which 
might in turn lower gene expression. This suggests a 
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potential relationship among gbM, splicing fidelity, and 
gene expression. 

The effect of godM on splicing fidelity has been tested 
across various taxa, and the results have been—like studies 
of aberrant transcription—somewhat inconsistent. In hon- 
eybee and mouse embryonic stem cells, for example, it is 
clear that alteration of DNA methylation impacts alternative 
Splicing (Lev Maor et al. 2015). In honeybees, DNA 
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methylation is predominantly on gene bodies in the CG 
context. A knockdown of the expression of dnmt3, which 
is required for de novo DNA methylation, decreased global 
genomic methylation level and caused widespread changes 
in alternative splicing in fat tissue (Li-Byarlay et al. 2013). In 
mouse embryonic stem cells, Yearim et al. (2015) con- 
structed an experimental system in which differential 
DNA methylation could be limited to a single gene while 
all other cellular factors remained identical. Using this sys- 
tem, they demonstrated a direct causal relationship be- 
tween DNA methylation and the recruitment of splicing 
factors. Patterns of methylation near splice sites have also 
been studied in maize, where CHG methylation of the spli- 
cing acceptor site is associated with a lower efficiency of 
splicing and CHH methylation does not correlate with spli- 
cing efficacy (Regulski et al. 2013). Surprisingly, however, 
the effect of CG methylation was not tested explicitly. 
Horvath et al. (2019) followed the maize work by measur- 
ing splicing fidelity in A. thaliana. They found that gobM 
was negatively correlated with the amount of RNA-seq 
reads that map to introns, suggesting that goM genes 
tend to retain fewer introns in their mRNA compared to 
UM genes. Similarly, Li et al. (2021) recently used ONT 
DRS to characterize splicing in A. thaliana. They found 
that retained introns had significantly lower CG methyla- 
tion levels around their splicing sites (both donor and ac- 
ceptor sites) compared to spliced introns in WT and some 
CHG and CHH methylation mutants. This suggests that 
gbM facilitates splicing. However, Bewick et al. (2016) 
found no evidence for this splicing effect when they com- 
pared met? epiRILs to WT plants. In fact, they found that 
WT goM genes retained significantly fewer intron reads 
than UM genes after they lost goM in the met? back- 
ground. This work suggests that this intron effect Is a prop- 
erty of the genes rather than gbM per se. 

Given contradictory results in the literature, we Turther 
tested the hypothesis that goM prevents aberrant intron re- 
tention using A. thaliana Isoseq data (Supplementary 
Material and Methods, Supplementary Material online). 
The proportion of full-length Isoseq reads that retained at 
least one intron was higher in goM genes (mean 0.149) 
compared to UM genes (mean 0.106). This result remained 
Significant after taking gene length and gene expression 
into account in a generalized linear model (table 1, 
Supplementary table S5, Supplementary Material online). 
We also measured intron RNA-seg coverage in WT and mu- 
tant A. thaliana plants that lack goM (Supplementary 
Material and Methods, Supplementary Material online). 
Similar to Horvath et al. (2019), we found that goM genes 
had a lower intron read coverage compared to UM genes in 
WT plants (fig. 3c) both in data set 1 (mean gbM genes in- 
tron coverage 61.51 RPKM, vs. 133.42 tor UM genes, one- 
sided Wilcoxon test P-value < 2.2 x 107 '®) andindataset2 
(mean gbM genes intron coverage 53.13 RPKM, vs. 208.68 
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in UM genes, one-sided Wilcoxon test P-value < 2.2 x 
10~'°). However, the difference between gbM genes and 
UM genes was also found in mutant plants that lack goM 
(Tig. 3c). Indeed, goM genes had a lower intron read cover- 
age compared to UM genes in met1—3 mutants (59.04 vs. 
131.65 RPKM, one-sided Wilcoxon test P-value < 2.2 x 
10~'®) and in met1, sdg7-8 triple mutants (51.27 vs. 
132.97 RPKM, one-sided Wilcoxon test P-value < 2.2 x 
10—'°). This again suggests that the fact that gobM genes 
have lower intron coverage in RNA-seg data is not due to 
their methylation state alone (table 1). Another possibility 
is again that some other epigenetic mark plays a redundant 
role with gbM in preventing aberrant intron retention. In 
Summary, there is not yet a clear consensus, or even a clear 
trend, as to whether gbM plays a role in splicing fidelity. 
However, we note again that most of the work in plants 
has focused on A. thaliana, and, as previously discussed, 
it may be helpful to extend these analyses to other species. 
The question of the potential role of goM in splicing fidelity, 
like its role in aberrant transcription, remains unresolved 
and will benefit from the broader and more comprehensive 
investigation. 


Potential Relationship between gbM 
and TE Insertion 


Another hypothesized function of gbM is that methylation 
protects against the insertion of some TEs (fig. 27). This hy- 
pothesis primarily stems from two studies in maize that fo- 
cused on Robertson's Mutator (Mu) transposons, which 
typically insert within or near genes and can be highly dele- 
terious by disrupting gene function. In the first study, Liu 
et al. (2009) found that Mu transposons insert preferentially 
within unmethylated regions of the B73 genome. However, 
the methylation context could not be determined, which 
motivated Regulski et al. (2013) to repeat the analyses 
with context-specific DNA methylation data. They found 
that Mu transposon insertion sites within genes were strong- 
ly depleted in CG-methylated regions (table 1), but these re- 
gions were not depleted in CHG nor CHH methylation 
relative to average gene methylation. This raises the possibil- 
ity that goM is beneficial because it deters transposon inser- 
tions. This hypothesis is also difficult to disentangle from 
covariates, particularly the observation that godM genes 
tend to be under stronger selective constraint than unmethy- 
lated genes, aS measured by nonsynonymous divergence 
(Takuno and Gaut 2012), although there Is conflicting evi- 
dence that gobM genes do (Niederhuth et al. 2016) or do 
not (Takuno and Gaut 2013) evolve more rapidly in the 
Poaceae. Nonetheless, it is possible that the apparent diftfer- 
ence in TE insertion reflects different strengths of selection 
on gbM versus unmethylated genes, rather than a direct ef- 
tect of goM on TE insertion rate. 
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It would be insightful to repeat analyses of the effect of 
DNA methylation on TE insertion in other plant species and 
with different TE types to test for the potential broader rele- 
vance of this idea. If this phenomenon occurs across diverse 
species and TE types, it could partially explain the link be- 
tween gbM and expression stabilization within and be- 
tween species. Indeed, a genic TE insertion might prevent 
proper gene expression because silencing of the TE by 
methylation in the three contexts might spread to the 
gene and affect expression (Choi and Lee 2020). Also, If 
the TE is inserted within an exon, It might lead to the ap- 
pearance of premature stop codons and the destruction 
of MRNA by nonsense-mediated mRNA decay. This would 
lead to more expression variation among accessions of a 
species and also among species (fig. 2c). 


Conclusions and Future Research 
Directions 
The molecular mechanisms leading to gobM establishment 


and maintenance In plants have been remarkably well elu- 
cidated (fig. 1). However, the function and potential 


importance of goM remain debated, as illustrated in this re- 
view by the numerous studies that are inconsistent or, in 
some cases, contradictory (table 1). The main sources for 
this persistent uncertainty come first trom the difficulty in 
disentangling epigenetic from genetic effects and second 
trom the complex system of redundancies and overlapping 
functions among gbM, histone variants, and other epigen- 
etic marks. These dependencies and redundancies 
undoubtedly complicate the interpretation of experimental 
mutants, as illustrated by contrasting results based on met? 
plants with or without A7 mutation (table 1; Bewick et al. 
2016; Choi et al. 2020). 

Despite these difficulties, there has been substantive 
progress toward understanding the effect, dynamics, and 
potential adaptive impact of gbM. In the last few years, sev- 
eral studies have concluded that gbM is associated with 
both the level and the variance of gene expression (table 
{PI}1). An interesting corollary of these observations is 
that many experimental efforts to measure this association 
based on A. thaliana mutants have yielded negative results 
(table 1). While these experiments may reflect reality, our 
view is that experimental approaches have nonetheless 
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suffered trom a few common shortcomings. First, we sus- 
pect (but certainly do not know) that the reliance on A. 
thaliana \s limiting; it is illogical to expect a strong experi- 
mental effect In a system that Is studied in part because 
methylation mutants are viable and thus may not have a 
strong effect relative to other plant systems. Second, as 
noted above, it is not clear that all mutants are equivalent 
because some mutants may be unsuitable for detecting 
specific effects due to dependencies and redundancies. 
Finally, it can be exceedingly difficult to identity subtle ef- 
fects using short-term experimental approaches. 
Unfortunately, however, the inability to detect an effect Is 
often incorrectly interpreted as the absence of an effect. 

In contrast, evolutionary and comparative approaches 
based on genetic diversity or species comparison have often, 
but not always, found an association between gbM and 
gene expression, particularly expression homeostasis (table 
1). These analyses also suffer from a number of potential 
drawbacks, including reliance on simplified models, discrete 
definitions of which genes are (or are not) goM, and an In- 
ability to disentangle causation from correlation. One poten- 
tial reason for detecting an effect using these approaches Is 
that even very subtle effects can accrue over time and thus 
be detected by evolutionary contrasts. It is difficult to estab- 
lish whether these associations are causal due to all the com- 
plex reasons cited above—that is, functional redundancies 
among epigenetic marks and difficulties in discriminating 
genetic trom epigenetic effects. Nonetheless, the apparent 
association between gbM and expression is important, be- 
cause it provides a potential phenotype on which selection 
can act. Although more investigation is needed to test 
whether gbM is shaped by selection, both phylogenetic 
and population genetic studies suggest that selection acts 
to maintain goM status in some genes. Moreover, popula- 
tion variation in goM has been shown to associate with fit- 
ness under water stress and selection for flowering time 
(Shahzad et al. 2021). These fitness effects appear to stem 
trom a correlation between gbM and gene expression, as de- 
monstrated by the fact that experimental modification of 
candidate gene expression affects the trait under study 
(Shahzad et al. 2021). Altogether, these results suggest 
that goM may affect fitness and phenotype through an ef- 
tect on gene expression, thereby potentially affecting species 
adaptation independently of genetic variation. We empha- 
size, however, that these effects are subtle, at best, and so, 
there is still much to learn, even about the simple question 
as to whether and when gbM associates with gene expres- 
sion (table 1). 

If there is selection on gbM itself—or on another epigen- 
etic feature that correlates with goM—then this fact may be 
the basis for an “important conceptual change in evolution- 
ary biology” (Cavalli and Heard 2019) because selection on 
epigenetic modifications has the potential to affect the 
timeframe of mutation and thus, potentially, of adaptative 
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change. As an example, figure 4 illustrates the tempo of epi- 
genetic versus genetic change. As we have noted, epigenet- 
ic change can be incredibly rapid. For example, CHH 
methylation is reset every generation, making It unlikely to 
be under direct selection because It is not transmitted trans- 
generationally. CHH methylation (and to some extent CHG 
methylation) is perhaps better described as tracking genetic 
(i.e., TE and CMT3) activity. In contrast, CG methylation is 
heritable and mutates approximately three and six orders 
of magnitude faster than gene duplication and nucleotides, 
respectively (Tig. 4). tis worth noting, however, that the rate 
of change of an entire gene or allele from goM to UM Is sub- 
stantially slower because it probably requires numerous 
changes of individual sites. The rate of change for an entire 
allele, based on our previous SFS analysis in A. thaliana po- 
pulations, is ~3 x 107’ per gene per generation (Muyle 
et al. 2021), an estimate based on models that require as- 
Sumptions about the effective population size but, if accur- 
ate, is still faster than the rate of nucleotide change. In 
theory, then, methylation variation may provide a rapid 
source of phenotypic novelty that could be subjected to nat- 
ural selection on rapid—and perhaps even ecological—time 
scales. 
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